The analysis looks at mental and physical health data from 2000-2019 from various sources the main one being the World Health Organization (WHO).
Analyze health data to gain insights into current consumers health patterns globally and in Kenya to be utilized to make data driven decisions.
-Company founders and C-suite teams. -Human Resource and Mental Health Professionals.
-What is the trend in global and local consumer mental and physical health? -How can these trends influence public and corporate strategies?
A good data source is ROCCC which stands for Reliable, Original, Comprehensive, Current, and Cited.
-Reliablity — High — The data comes from global population sample data sources.
-Originality — LOW — Third party provider (WHO).
-Comprehensive — HIGH — There are several variables summarized into between 1,700-10,980 observations for a period of over 15 years which was fairly comprehensive.
-Current — MID — Data is 3 years old and may not be as relevant as there is no covid data updated to it.
-Cited — HIGH — Data collected from a reliable third party that comprehensively reports its data collection process publicly.
Overall, the dataset is good quality data however its recommended that an updated analysis be done on the health trends during and post-covid.
The process begins by retrieving the following packages from the library:
library('tidyverse')
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library('lubridate')
library('ggplot2')
library('dplyr')
library('readxl')
library('tidyr')
library('janitor')
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library('skimr')
library('sqldf')
## Loading required package: gsubfn
## Loading required package: proto
## Warning in doTryCatch(return(expr), name, parentenv, handler): unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
## dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 0x0006): Library not loaded: '/opt/X11/lib/libSM.6.dylib'
## Referenced from: '/Library/Frameworks/R.framework/Versions/4.2/Resources/modules/R_X11.so'
## Reason: tried: '/opt/X11/lib/libSM.6.dylib' (no such file), '/Library/Frameworks/R.framework/Resources/lib/libSM.6.dylib' (no such file), '/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/jre/lib/server/libSM.6.dylib' (no such file)
## tcltk DLL is linked to '/opt/X11/lib/libX11.6.dylib'
## Could not load tcltk. Will use slower R code instead.
## Loading required package: RSQLite
library('plotrix')
library('knitr')
library('reshape2')
##
## Attaching package: 'reshape2'
##
## The following object is masked from 'package:tidyr':
##
## smiths
The next step is identifying the current working directory.
getwd()
## [1] "/Users/admin/Downloads/Mental health 2023./CSV"
Afterwards, set the working directory to simplify data calls.
setwd("/Users/admin/Downloads/Mental health 2023./CSV")
Last step of the data retrieval process is identifying the data and uploading them
dp <- read.csv("prevalence-of-depression-males-vs-females.csv")
ax <- read.csv("prevalence-of-anxiety-disorders-males-vs-females.csv")
bp <- read.csv("prevalence-of-bipolar-disorder-in-males-vs-females.csv")
ed <- read.csv("prevalence-of-eating-disorders-in-males-vs-females.csv")
sz <- read.csv("prevalence-of-schizophrenia-in-males-vs-females.csv")
sr <- read.csv("suicide_rates_2019.csv")
ab <- read.csv("share-with-alcohol-and-substance-use-disorders 1990-2016.csv")
health <- read.csv("data.csv")
rd <- read.csv("road_death_2019.csv")
Removed all the N/A values from the data frames
ax <- na.omit(ax)
ax <-ax[,-8]
bp <- na.omit(bp)
bp <-bp[,-8]
dp <- na.omit(dp)
dp <-dp[,-8]
ed <- na.omit(ed)
ed <-ed[,-8]
sz <- na.omit(sz)
sz <- sz[,-8]
ab <- na.omit(ab)
sr <- na.omit(sr)
health <- health[,-6:-8]
rd <- na.omit(rd)
Made use of the ‘clean_names’ function from the janitor package to ensure proper naming of variables.
ax <- clean_names(ax)
bp <- clean_names(bp)
dp <- clean_names(dp)
ed <- clean_names(ed)
sz <- clean_names(sz)
ab <- clean_names(ab)
sr <- clean_names(sr)
health <- clean_names(health)
rd <- clean_names(rd)
The analysis then proceeded to joining some data frames into one to ease analysis.
agg <- merge(ax, bp, by = c("index","year", "entity", "code", "population_historical_estimates"))
agg <- merge(agg, dp, by = c("index","year", "entity", "code", "population_historical_estimates"))
agg <- merge(agg, ed, by = c("index","year", "entity", "code", "population_historical_estimates"))
agg <- merge(agg, sz, by = c("index","year", "entity", "code", "population_historical_estimates"))
remove(ax,bp,dp,ed,sz)
Some columns in datasets were renamed to ease analysis.
agg <- agg %>%
rename(country = entity)
ab <- ab %>%
rename(country = entity)
The suicide data was summarized to also ease analysis.
su.r <- sr %>%
group_by(continent, code, country, year, sex) %>%
summarise(suicide_rate = sum(suicide_rate)) %>%
arrange(continent, code, country, year, sex)
## `summarise()` has grouped output by 'continent', 'code', 'country', 'year'. You
## can override using the `.groups` argument.
The newly joined dataset was slightly transformed to improve analysis.
agg <- agg %>%
mutate(prevalence_anxiety_disorders_sex_both = (prevalence_anxiety_disorders_sex_male_age_age_standardized_percent + prevalence_anxiety_disorders_sex_female_age_age_standardized_percent)
, prevalence_bipolar_disorder_both = (prevalence_bipolar_disorder_sex_male_age_age_standardized_percent + prevalence_bipolar_disorder_sex_female_age_age_standardized_percent)
, prevalence_depressive_disorders_both = (prevalence_depressive_disorders_sex_male_age_age_standardized_percent + prevalence_depressive_disorders_sex_female_age_age_standardized_percent)
, prevalence_eating_disorders_both = (prevalence_eating_disorders_sex_male_age_age_standardized_percent + prevalence_eating_disorders_sex_female_age_age_standardized_percent)
, prevalence_schizophrenia_both = (prevalence_schizophrenia_sex_male_age_age_standardized_percent + prevalence_schizophrenia_sex_female_age_age_standardized_percent)) %>%
relocate(prevalence_anxiety_disorders_sex_both, .after = prevalence_anxiety_disorders_sex_female_age_age_standardized_percent) %>%
relocate(prevalence_bipolar_disorder_both, .after = prevalence_bipolar_disorder_sex_female_age_age_standardized_percent) %>%
relocate(prevalence_depressive_disorders_both, .after = prevalence_depressive_disorders_sex_female_age_age_standardized_percent) %>%
relocate(prevalence_eating_disorders_both, .after = prevalence_eating_disorders_sex_female_age_age_standardized_percent) %>%
relocate(prevalence_schizophrenia_both, .after = prevalence_schizophrenia_sex_female_age_age_standardized_percent)
An analysis of the Structure & composition of summarized data was done and indicated the number of rows and columns while also displaying the variable type in the columns.
str(agg)
## 'data.frame': 6150 obs. of 20 variables:
## $ index : int 1 10 10028 10029 10030 10031 10032 10033 10034 10035 ...
## $ year : int 1990 1999 1990 1991 1992 1993 1994 1995 1996 1997 ...
## $ country : chr "Afghanistan" "Afghanistan" "Chile" "Chile" ...
## $ code : chr "AFG" "AFG" "CHL" "CHL" ...
## $ population_historical_estimates : num 12412311 20170847 13274617 13495255 13719818 ...
## $ prevalence_anxiety_disorders_sex_male_age_age_standardized_percent : num 3.56 3.55 2.56 2.57 2.58 ...
## $ prevalence_anxiety_disorders_sex_female_age_age_standardized_percent : num 5.97 5.98 8.69 8.71 8.72 ...
## $ prevalence_anxiety_disorders_sex_both : num 9.53 9.53 11.25 11.28 11.3 ...
## $ prevalence_bipolar_disorder_sex_male_age_age_standardized_percent : num 0.675 0.673 0.949 0.951 0.953 ...
## $ prevalence_bipolar_disorder_sex_female_age_age_standardized_percent : num 0.762 0.761 1.061 1.063 1.064 ...
## $ prevalence_bipolar_disorder_both : num 1.44 1.43 2.01 2.01 2.02 ...
## $ prevalence_depressive_disorders_sex_male_age_age_standardized_percent : num 4.29 4.32 3.18 3.17 3.16 ...
## $ prevalence_depressive_disorders_sex_female_age_age_standardized_percent: num 5.86 5.87 6.02 6.02 6.01 ...
## $ prevalence_depressive_disorders_both : num 10.15 10.19 9.19 9.19 9.17 ...
## $ prevalence_eating_disorders_sex_male_age_age_standardized_percent : num 0.0914 0.0703 0.1702 0.1729 0.176 ...
## $ prevalence_eating_disorders_sex_female_age_age_standardized_percent : num 0.165 0.125 0.396 0.401 0.408 ...
## $ prevalence_eating_disorders_both : num 0.256 0.196 0.566 0.574 0.584 ...
## $ prevalence_schizophrenia_sex_male_age_age_standardized_percent : num 0.244 0.235 0.357 0.358 0.36 ...
## $ prevalence_schizophrenia_sex_female_age_age_standardized_percent : num 0.216 0.206 0.311 0.312 0.313 ...
## $ prevalence_schizophrenia_both : num 0.46 0.441 0.668 0.67 0.673 ...
str(su.r)
## gropd_df [10,980 × 6] (S3: grouped_df/tbl_df/tbl/data.frame)
## $ continent : chr [1:10980] "Africa" "Africa" "Africa" "Africa" ...
## $ code : chr [1:10980] "AGO" "AGO" "AGO" "AGO" ...
## $ country : chr [1:10980] "Angola" "Angola" "Angola" "Angola" ...
## $ year : int [1:10980] 2000 2000 2000 2001 2001 2001 2002 2002 2002 2003 ...
## $ sex : chr [1:10980] "Both sexes" "Female" "Male" "Both sexes" ...
## $ suicide_rate: num [1:10980] 17.56 6.16 29.96 17.46 6.13 ...
## - attr(*, "groups")= tibble [3,660 × 5] (S3: tbl_df/tbl/data.frame)
## ..$ continent: chr [1:3660] "Africa" "Africa" "Africa" "Africa" ...
## ..$ code : chr [1:3660] "AGO" "AGO" "AGO" "AGO" ...
## ..$ country : chr [1:3660] "Angola" "Angola" "Angola" "Angola" ...
## ..$ year : int [1:3660] 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ...
## ..$ .rows : list<int> [1:3660]
## .. ..$ : int [1:3] 1 2 3
## .. ..$ : int [1:3] 4 5 6
## .. ..$ : int [1:3] 7 8 9
## .. ..$ : int [1:3] 10 11 12
## .. ..$ : int [1:3] 13 14 15
## .. ..$ : int [1:3] 16 17 18
## .. ..$ : int [1:3] 19 20 21
## .. ..$ : int [1:3] 22 23 24
## .. ..$ : int [1:3] 25 26 27
## .. ..$ : int [1:3] 28 29 30
## .. ..$ : int [1:3] 31 32 33
## .. ..$ : int [1:3] 34 35 36
## .. ..$ : int [1:3] 37 38 39
## .. ..$ : int [1:3] 40 41 42
## .. ..$ : int [1:3] 43 44 45
## .. ..$ : int [1:3] 46 47 48
## .. ..$ : int [1:3] 49 50 51
## .. ..$ : int [1:3] 52 53 54
## .. ..$ : int [1:3] 55 56 57
## .. ..$ : int [1:3] 58 59 60
## .. ..$ : int [1:3] 61 62 63
## .. ..$ : int [1:3] 64 65 66
## .. ..$ : int [1:3] 67 68 69
## .. ..$ : int [1:3] 70 71 72
## .. ..$ : int [1:3] 73 74 75
## .. ..$ : int [1:3] 76 77 78
## .. ..$ : int [1:3] 79 80 81
## .. ..$ : int [1:3] 82 83 84
## .. ..$ : int [1:3] 85 86 87
## .. ..$ : int [1:3] 88 89 90
## .. ..$ : int [1:3] 91 92 93
## .. ..$ : int [1:3] 94 95 96
## .. ..$ : int [1:3] 97 98 99
## .. ..$ : int [1:3] 100 101 102
## .. ..$ : int [1:3] 103 104 105
## .. ..$ : int [1:3] 106 107 108
## .. ..$ : int [1:3] 109 110 111
## .. ..$ : int [1:3] 112 113 114
## .. ..$ : int [1:3] 115 116 117
## .. ..$ : int [1:3] 118 119 120
## .. ..$ : int [1:3] 121 122 123
## .. ..$ : int [1:3] 124 125 126
## .. ..$ : int [1:3] 127 128 129
## .. ..$ : int [1:3] 130 131 132
## .. ..$ : int [1:3] 133 134 135
## .. ..$ : int [1:3] 136 137 138
## .. ..$ : int [1:3] 139 140 141
## .. ..$ : int [1:3] 142 143 144
## .. ..$ : int [1:3] 145 146 147
## .. ..$ : int [1:3] 148 149 150
## .. ..$ : int [1:3] 151 152 153
## .. ..$ : int [1:3] 154 155 156
## .. ..$ : int [1:3] 157 158 159
## .. ..$ : int [1:3] 160 161 162
## .. ..$ : int [1:3] 163 164 165
## .. ..$ : int [1:3] 166 167 168
## .. ..$ : int [1:3] 169 170 171
## .. ..$ : int [1:3] 172 173 174
## .. ..$ : int [1:3] 175 176 177
## .. ..$ : int [1:3] 178 179 180
## .. ..$ : int [1:3] 181 182 183
## .. ..$ : int [1:3] 184 185 186
## .. ..$ : int [1:3] 187 188 189
## .. ..$ : int [1:3] 190 191 192
## .. ..$ : int [1:3] 193 194 195
## .. ..$ : int [1:3] 196 197 198
## .. ..$ : int [1:3] 199 200 201
## .. ..$ : int [1:3] 202 203 204
## .. ..$ : int [1:3] 205 206 207
## .. ..$ : int [1:3] 208 209 210
## .. ..$ : int [1:3] 211 212 213
## .. ..$ : int [1:3] 214 215 216
## .. ..$ : int [1:3] 217 218 219
## .. ..$ : int [1:3] 220 221 222
## .. ..$ : int [1:3] 223 224 225
## .. ..$ : int [1:3] 226 227 228
## .. ..$ : int [1:3] 229 230 231
## .. ..$ : int [1:3] 232 233 234
## .. ..$ : int [1:3] 235 236 237
## .. ..$ : int [1:3] 238 239 240
## .. ..$ : int [1:3] 241 242 243
## .. ..$ : int [1:3] 244 245 246
## .. ..$ : int [1:3] 247 248 249
## .. ..$ : int [1:3] 250 251 252
## .. ..$ : int [1:3] 253 254 255
## .. ..$ : int [1:3] 256 257 258
## .. ..$ : int [1:3] 259 260 261
## .. ..$ : int [1:3] 262 263 264
## .. ..$ : int [1:3] 265 266 267
## .. ..$ : int [1:3] 268 269 270
## .. ..$ : int [1:3] 271 272 273
## .. ..$ : int [1:3] 274 275 276
## .. ..$ : int [1:3] 277 278 279
## .. ..$ : int [1:3] 280 281 282
## .. ..$ : int [1:3] 283 284 285
## .. ..$ : int [1:3] 286 287 288
## .. ..$ : int [1:3] 289 290 291
## .. ..$ : int [1:3] 292 293 294
## .. ..$ : int [1:3] 295 296 297
## .. .. [list output truncated]
## .. ..@ ptype: int(0)
## ..- attr(*, ".drop")= logi TRUE
str(rd)
## 'data.frame': 3660 obs. of 5 variables:
## $ continent : chr "Americas" "Western Pacific" "South-East Asia" "Western Pacific" ...
## $ code : chr "ATG" "FSM" "MDV" "KIR" ...
## $ country : chr "Antigua and Barbuda" "Micronesia (Federated States of)" "Maldives" "Kiribati" ...
## $ year : int 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
## $ road_traffic_death_rate: num 0 0.16 1.63 1.92 10.1 ...
str(health)
## 'data.frame': 3661 obs. of 5 variables:
## $ countries : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ year : int 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 ...
## $ probability_of_dying_between_age_30_and_exact_age_70_from_any_of_cardiovascular_disease_cancer_diabetes_or_chronic_respiratory_disease_both : num 35.3 35.4 35.5 35.6 35.6 35.7 36.2 36.6 37.1 37.8 ...
## $ probability_of_dying_between_age_30_and_exact_age_70_from_any_of_cardiovascular_disease_cancer_diabetes_or_chronic_respiratory_disease_male : num 34.4 34.7 35 35.1 35.4 35.7 36.2 36.8 37.5 38.3 ...
## $ probability_of_dying_between_age_30_and_exact_age_70_from_any_of_cardiovascular_disease_cancer_diabetes_or_chronic_respiratory_disease_female: num 36.2 36 35.9 36 35.7 35.7 36.1 36.3 36.8 37.4 ...
The analysis proceeded to finding out the number of unique participants in the data.
n_distinct(agg$code)
## [1] 205
n_distinct(su.r$country)
## [1] 183
n_distinct(health$countries)
## [1] 184
n_distinct(rd$country)
## [1] 183
Finally a summary of the data for the data frames were conducted.
summary(agg)
## index year country code
## Min. : 1 Min. :1990 Length:6150 Length:6150
## 1st Qu.:13789 1st Qu.:1997 Class :character Class :character
## Median :28284 Median :2004 Mode :character Mode :character
## Mean :28149 Mean :2004
## 3rd Qu.:42297 3rd Qu.:2012
## Max. :56164 Max. :2019
## population_historical_estimates
## Min. :1.126e+03
## 1st Qu.:1.129e+06
## Median :6.077e+06
## Mean :6.350e+07
## 3rd Qu.:2.091e+07
## Max. :7.713e+09
## prevalence_anxiety_disorders_sex_male_age_age_standardized_percent
## Min. :1.396
## 1st Qu.:2.786
## Median :3.154
## Mean :3.275
## 3rd Qu.:3.582
## Max. :6.504
## prevalence_anxiety_disorders_sex_female_age_age_standardized_percent
## Min. : 2.425
## 1st Qu.: 4.241
## Median : 5.046
## Mean : 5.303
## 3rd Qu.: 5.984
## Max. :12.065
## prevalence_anxiety_disorders_sex_both
## Min. : 3.904
## 1st Qu.: 7.082
## Median : 8.135
## Mean : 8.578
## 3rd Qu.: 9.597
## Max. :17.794
## prevalence_bipolar_disorder_sex_male_age_age_standardized_percent
## Min. :0.1838
## 1st Qu.:0.5294
## Median :0.5859
## Mean :0.6375
## 3rd Qu.:0.8389
## Max. :1.5849
## prevalence_bipolar_disorder_sex_female_age_age_standardized_percent
## Min. :0.1931
## 1st Qu.:0.5433
## Median :0.6069
## Mean :0.7045
## 3rd Qu.:0.9557
## Max. :1.7528
## prevalence_bipolar_disorder_both
## Min. :0.3801
## 1st Qu.:1.0812
## Median :1.1793
## Mean :1.3421
## 3rd Qu.:1.7966
## Max. :3.3377
## prevalence_depressive_disorders_sex_male_age_age_standardized_percent
## Min. :1.314
## 1st Qu.:2.608
## Median :3.018
## Mean :3.141
## 3rd Qu.:3.609
## Max. :7.259
## prevalence_depressive_disorders_sex_female_age_age_standardized_percent
## Min. :1.963
## 1st Qu.:3.806
## Median :4.673
## Mean :4.694
## 3rd Qu.:5.403
## Max. :8.977
## prevalence_depressive_disorders_both
## Min. : 3.277
## 1st Qu.: 6.393
## Median : 7.701
## Mean : 7.836
## 3rd Qu.: 9.059
## Max. :15.237
## prevalence_eating_disorders_sex_male_age_age_standardized_percent
## Min. :0.03390
## 1st Qu.:0.07322
## Median :0.10426
## Mean :0.12861
## 3rd Qu.:0.15227
## Max. :0.75303
## prevalence_eating_disorders_sex_female_age_age_standardized_percent
## Min. :0.05741
## 1st Qu.:0.12259
## Median :0.19404
## Mean :0.28510
## 3rd Qu.:0.35274
## Max. :1.51230
## prevalence_eating_disorders_both
## Min. :0.0915
## 1st Qu.:0.1961
## Median :0.3016
## Mean :0.4137
## 3rd Qu.:0.5033
## Max. :2.2653
## prevalence_schizophrenia_sex_male_age_age_standardized_percent
## Min. :0.1961
## 1st Qu.:0.2647
## Median :0.3013
## Mean :0.2947
## 3rd Qu.:0.3212
## Max. :0.5218
## prevalence_schizophrenia_sex_female_age_age_standardized_percent
## Min. :0.1863
## 1st Qu.:0.2373
## Median :0.2725
## Mean :0.2640
## 3rd Qu.:0.2889
## Max. :0.4933
## prevalence_schizophrenia_both
## Min. :0.3832
## 1st Qu.:0.5026
## Median :0.5730
## Mean :0.5587
## 3rd Qu.:0.6082
## Max. :1.0114
summary(su.r)
## continent code country year
## Length:10980 Length:10980 Length:10980 Min. :2000
## Class :character Class :character Class :character 1st Qu.:2005
## Mode :character Mode :character Mode :character Median :2010
## Mean :2010
## 3rd Qu.:2014
## Max. :2019
## sex suicide_rate
## Length:10980 Min. : 0.00
## Class :character 1st Qu.: 4.63
## Mode :character Median : 8.27
## Mean : 11.96
## 3rd Qu.: 14.94
## Max. :195.20
summary(rd)
## continent code country year
## Length:3660 Length:3660 Length:3660 Min. :2000
## Class :character Class :character Class :character 1st Qu.:2005
## Mode :character Mode :character Mode :character Median :2010
## Mean :2010
## 3rd Qu.:2014
## Max. :2019
## road_traffic_death_rate
## Min. : 0.00
## 1st Qu.:11.25
## Median :16.76
## Mean :18.05
## 3rd Qu.:25.19
## Max. :64.60
summary(health)
## countries year
## Length:3661 Min. :2000
## Class :character 1st Qu.:2005
## Mode :character Median :2010
## Mean :2010
## 3rd Qu.:2014
## Max. :2019
## NA's :1
## probability_of_dying_between_age_30_and_exact_age_70_from_any_of_cardiovascular_disease_cancer_diabetes_or_chronic_respiratory_disease_both
## Min. : 7.30
## 1st Qu.:16.10
## Median :21.70
## Mean :22.06
## 3rd Qu.:26.60
## Max. :56.00
## NA's :1
## probability_of_dying_between_age_30_and_exact_age_70_from_any_of_cardiovascular_disease_cancer_diabetes_or_chronic_respiratory_disease_male
## Min. : 9.60
## 1st Qu.:18.00
## Median :24.50
## Mean :25.64
## 3rd Qu.:31.50
## Max. :64.10
## NA's :1
## probability_of_dying_between_age_30_and_exact_age_70_from_any_of_cardiovascular_disease_cancer_diabetes_or_chronic_respiratory_disease_female
## Min. : 4.40
## 1st Qu.:13.10
## Median :18.30
## Mean :18.67
## 3rd Qu.:23.30
## Max. :47.80
## NA's :1
#Visualizig the data. # Population stats. ## Determining the Global population. The global average population has been on the rise constantly since the 1990s.
The average population in kenya has also constantly been on the rise since 1990. This indicates an increase in available labour however a look into the age distribution of the population will determine the true labour productivity of the population.
The average global suicide rates have been significantly higher for males compared to females in 2019.
The latest suicide rates reported indicate that Kenya possesses a higher average suicide rate than the global average for both males and females, moreover, males have a higher margin of suicide rates than females.
Global suicide rates have been falling significantly since the 1990s.
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
Global suicide rates have been dropping for both sexes however the margin of decrease between men has been significantly higher than for women.
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
The Kenyan average suicide rates have dropped significantly since 1990 however the suicide rates have consistently risen since 2016
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
Kenyan suicide rates have been dropping for both sexes since 1990 however because the male suicide rates are significantly higher the margin of decrease for males has been significantly higher than for women while there is a significant increase since 2016 observed for both genders especially males as expected.
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
Global average probability of death from critical diseases has been falling significantly since the 2000s.
## Warning: Removed 1 row containing missing values (`geom_line()`).
There has been a significantly higher probability of death for males than females however there has been a drop in the probability of death for both sexes.
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Removed 1 row containing missing values (`geom_line()`).
The probability of death from the stated diseases in Kenya has been staggered since 2000 however the probability has dropped since 2008 till 2016 thereafter a slight increase was observed after 2017.
As expected there was a staggered trend in the probability of death from the stated diseases in Kenya however there was a significantly higher number of men dying than women.
The prevalence of alcohol and substance use disorder was observed to increase from the 90s to 2000 however a significant decline was observed froom 2000-2010 thereafter a significant increase was observed till 2019 with Russia having the highest prevalence globally
The Kenyan average alcohol prevalence also increased significantly since 1990 till 2000 where there was only a slight increase till 2005 where a decrease was observed till 2010 and there was an sharp increase there after since 2010 just as observed with the global trend.
The global average prevalence of anxiety disorders have been on a consistent rise since 1990.
The global average prevalence of anxiety disorders in males has been on a consistent rise since 1990.
The global average prevalence of anxiety disorders in females has similarly been on a consistent rise since 1990.
There is a singificant margin between the prevalence of anxiety disorders between the sexes with the prevalence being higher among females than males.
The Kenyan average prevalence of anxiety disorder has been on a consistent rise from 1990 to 2017 where a decrease is being observed until 2019.
The kenyan average prevalence of anxiety disorders in males has been increasing significantly since 1990 and coincidentally only dropped from 2006-2009 but has risen significantly ever since.
The Kenyan average prevalence of anxiety disorders in females has been staggered over the years however a significant dip has been observed from 2000-2005 which were the highest average prevalent anxiety disorders years for males. The decrease in anxiety disorders reported only decreased for females from 2017 to 2019.
Although the anxiety disorders prevalence levels for females has been dropping in the recent years there are still significantly higher levels of reported anxiety disorders in females compared to males.
The global average prevalence of bipolar disorder rising significantly since 1990.
The global average prevalence of bipolar disorder in males has risen since 1990 as expected.
The global average prevalence of bipolar disorder in females has risen since 1990 as expected showing a similar trand to the rise in bipolar disorder in males.
The levels of reported bipolar disorder in females has been significantly higher compared to males.
Bipolar disorder prevalence in Kenya has been rising steadily from 1990 to 2000 thereafter an exponential growth in prevalence was observed from 2000-2015 where a plateau trend was observed.
The bipolar disorder prevalence pattern has shown the same trend as the kenyan aggregated prevalence analysis patten with a plateau has been observed in the recent years however the male only data shows that the decrease has been more significant than the female prevalence decrease since 2015.
The bipolar disorder prevalence pattern has shown the same trend as the kenyan aggregated prevalence analysis patten with a plateau has been observed in the recent years.
Unlike the global bipolar prevalence data, the Kenyan analysis shows that bipolar disorder is more prevalent in males than females however the prevalence margin is significantly smaller locally than globally.
The global depressive disorders prevalence has seen a significant decrease since 2000 till 2017 where there was a significant increase till 2019.
The global depressive disorders prevalence in males similarly to the aggregated data has seen a significant decrease since 2000 till 2017 where there was a more significant increase till 2019.
The global depressive disorder prevalence in males similarly to the aggregated data has seen a significant decrease since 2000 till 2017 where there was a less steep rise till 2019.
There has been a significantly higher reported prevalence of depressive disorders in females than in males globally.
The Kenyan depressive disorder prevalence similarly to the global deppressive prevalence data has seen a significant decrease since 2000 till 2017 where there was a slight increase till 2019.
The Kenyan depressive disorders prevalence in males similarly to the Kenyan deppressive prevalence aggregated data has seen a significant decrease since 2000 till 2017 where there was a slight increase till 2019.
The Kenyan depressive disorders prevalence in females similarly to the Kenyan deppressive prevalence aggregated data has seen a significant decrease since 2000 till 2017 where there was a slight increase till 2019.
There has been a significantly higher reported Kenyan prevalence of depressive disorders in females than in males however the prevalence margin between the sexes is higher in Kenya than globally.
The global average prevalence of eating disorders has been rising significantly since 1990.
The global average prevalence of eating disorders in males has also been rising significantly since 1990.
The global average prevalence of eating disorders in females has similarly been rising significantly since 1990.
There has been a significantly higher reported prevalence of eating disorders in females than in males globally.
In contradiction to the global eating disorder pattern the prevalence of the disorder in Kenya has been decreasing significantly since 1990 till 2004 where a significant rise has been observed till 2019.
As expected, the prevalence of the disorders amongst males in Kenya has been decreasing significantly since 1990 till 2004 where a significant rise has been observed till 2019.
Similarly, the prevalence of the disorders amongst females in Kenya has also been decreasing significantly since 1990 till 2004 where a significant rise has been observed till 2019.
There has been a significantly higher reported prevalence of eating disorders in females than in males globally however the marginal difference is smaller than the global prevalence margin between males and females.
The global average prevalence of schizophrenia has been rising significantly since 1990.
As expected, the global average prevalence of schizophrenia in males risen significantly since 1990.
Similarly, the global average prevalence of schizophrenia in females risen significantly since 1990.
There has been a significantly higher average prevalence of reported cases of schizophrenia in males compared to females.
The prevalence of schizophrenia has been increasing in a staggered but consistent pattern since 1990.
The Kenyan prevalence of schizophrenia in males has been increasing in a staggered but consistent pattern since 1990. A notable observation is the sharp rise from 2000.
The Kenyan prevalence of schizophrenia in females has also been increasing in a staggered but consistent pattern since 1990. A notable observation is the steadier rise from 2000 as opposed to the sharp rise for the prevalence in males.
There has also been a significantly higher average prevalence of reported cases of schizophrenia in males compared to females however the prevalence margin between the two sexes are lower in Kenya compared to globally.
The global mean death by road accident had been staggeringly steady from 2000-2007 however there was a sharp decline observed over the next 7 years afterwhich there were more minute rises and falls till 2019.
Contrary to the global road death accident prevalence pattern, the death rate has been observed to increase from 2010 to 2019.
#Health variable correlation. This began by aggregating all of the health variables in the analysis as follows.
hagg <- merge(ab, agg, by = c("country", "year", "code"))
health <- health %>%
rename(country = countries)
hagg <- merge(hagg, health, by = c("country", "year"))
hagg <- merge(hagg, su.r, by = c("country", "year", "code"))
hagg <- merge(hagg, rd, by = c("country", "year", "code"))
The analysis then proceeded to remove any unnecessary or excess data.
## Removing unnecessary variables from data frame.
hagg <- hagg[,-28]
hagg <- hagg[,-25:-26]
hagg <- hagg [,-5]
## Removing excessive variables including non-numerical variables.
nhagg <- hagg[,-1:-3]
nhagg <- nhagg[,-2:-4]
nhagg <- nhagg[,-3:-4]
nhagg <- nhagg[,-4:-5]
nhagg <- nhagg[,-5:-6]
nhagg <- nhagg[,-6:-7]
nhagg <- nhagg[,-8:-9]
Thereafter the health variables were renamed in order to fit properly on the heat map.
nhagg <- nhagg %>%
rename(aldis = prevalence_alcohol_and_substance_use_disorders_both_age_standardized_percent) %>%
rename(andis = prevalence_anxiety_disorders_sex_both) %>%
rename(bidis = prevalence_bipolar_disorder_both) %>%
rename(depdis = prevalence_depressive_disorders_both) %>%
rename(eddis = prevalence_eating_disorders_both) %>%
rename(schdis = prevalence_schizophrenia_both) %>%
rename(dydis = probability_of_dying_between_age_30_and_exact_age_70_from_any_of_cardiovascular_disease_cancer_diabetes_or_chronic_respiratory_disease_both) %>%
rename(sui = suicide_rate) %>%
rename(rdth = road_traffic_death_rate)
The correlations of the variables with each other were calculated.
corhagg <- round(cor(nhagg),2)
head(corhagg)
## aldis andis bidis depdis eddis schdis dydis sui rdth
## aldis 1.00 0.08 0.24 -0.06 0.23 0.36 -0.09 0.20 -0.24
## andis 0.08 1.00 0.65 0.09 0.66 0.42 -0.47 -0.18 -0.35
## bidis 0.24 0.65 1.00 0.09 0.74 0.36 -0.61 -0.18 -0.31
## depdis -0.06 0.09 0.09 1.00 -0.01 -0.46 0.13 0.20 0.43
## eddis 0.23 0.66 0.74 -0.01 1.00 0.60 -0.64 -0.14 -0.53
## schdis 0.36 0.42 0.36 -0.46 0.60 1.00 -0.45 -0.19 -0.70
The correlation matrix was then created.
melted_corhagg <- melt(corhagg)
head(melted_corhagg)
## Var1 Var2 value
## 1 aldis aldis 1.00
## 2 andis aldis 0.08
## 3 bidis aldis 0.24
## 4 depdis aldis -0.06
## 5 eddis aldis 0.23
## 6 schdis aldis 0.36
The heat map was created to visualize the correlations
ggplot(melted_corhagg, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1))+
coord_fixed() +
geom_text(aes(Var2, Var1, label = value), color = "black", size = 3)
The heat map was then refined to be more visually appealing.
# Get lower triangle of the correlation matrix
get_lower_tri<-function(corhagg){
corhagg[upper.tri(corhagg)] <- NA
return(corhagg)
}
# Get upper triangle of the correlation matrix
get_upper_tri <- function(corhagg){
corhagg[lower.tri(corhagg)]<- NA
return(corhagg)
}
upper_tri <- get_upper_tri(corhagg)
upper_tri
## aldis andis bidis depdis eddis schdis dydis sui rdth
## aldis 1 0.08 0.24 -0.06 0.23 0.36 -0.09 0.20 -0.24
## andis NA 1.00 0.65 0.09 0.66 0.42 -0.47 -0.18 -0.35
## bidis NA NA 1.00 0.09 0.74 0.36 -0.61 -0.18 -0.31
## depdis NA NA NA 1.00 -0.01 -0.46 0.13 0.20 0.43
## eddis NA NA NA NA 1.00 0.60 -0.64 -0.14 -0.53
## schdis NA NA NA NA NA 1.00 -0.45 -0.19 -0.70
## dydis NA NA NA NA NA NA 1.00 0.38 0.37
## sui NA NA NA NA NA NA NA 1.00 0.20
## rdth NA NA NA NA NA NA NA NA 1.00
# Melt the correlation matrix
melted_corhagg_u <- melt(upper_tri, na.rm = TRUE)
# Create a correlation Heat map
ggplot(melted_corhagg_u, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1))+
coord_fixed()+
geom_text(aes(Var2, Var1, label = value), color = "black", size = 3) +
labs(title = "Global Correlation Heat Map Of Health Variables")+
theme(plot.title = element_text(hjust = 0.5, face="bold")) +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.justification = c(1, 0),
legend.position = c(0.6, 0.7),
legend.direction = "horizontal")+
guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
title.position = "top", title.hjust = 0.5))
The analysis went further and created a correlation heat map of variables in Kenya.
khagg <- hagg %>%
filter(country == "Kenya")
## Removing excessive variables including non-numerical variables.
knhagg <- khagg[,-1:-3]
knhagg <- knhagg[,-2:-4]
knhagg <- knhagg[,-3:-4]
knhagg <- knhagg[,-4:-5]
knhagg <- knhagg[,-5:-6]
knhagg <- knhagg[,-6:-7]
knhagg <- knhagg[,-8:-9]
## Renaming the variables to short forms.
knhagg <- knhagg %>%
rename(aldis = prevalence_alcohol_and_substance_use_disorders_both_age_standardized_percent) %>%
rename(andis = prevalence_anxiety_disorders_sex_both) %>%
rename(bidis = prevalence_bipolar_disorder_both) %>%
rename(depdis = prevalence_depressive_disorders_both) %>%
rename(eddis = prevalence_eating_disorders_both) %>%
rename(schdis = prevalence_schizophrenia_both) %>%
rename(dydis = probability_of_dying_between_age_30_and_exact_age_70_from_any_of_cardiovascular_disease_cancer_diabetes_or_chronic_respiratory_disease_both) %>%
rename(sui = suicide_rate) %>%
rename(rdth = road_traffic_death_rate)
## Finding the correlations between variables and rounding off to the nearest 2 dp.
corkhagg <- round(cor(knhagg),2)
head(corkhagg)
## aldis andis bidis depdis eddis schdis dydis sui rdth
## aldis 1.00 0.67 0.45 -0.47 0.80 0.65 -0.83 -0.11 0.89
## andis 0.67 1.00 0.96 -0.97 0.97 0.92 -0.48 -0.20 0.59
## bidis 0.45 0.96 1.00 -1.00 0.87 0.91 -0.27 -0.21 0.37
## depdis -0.47 -0.97 -1.00 1.00 -0.88 -0.90 0.29 0.20 -0.40
## eddis 0.80 0.97 0.87 -0.88 1.00 0.86 -0.63 -0.18 0.74
## schdis 0.65 0.92 0.91 -0.90 0.86 1.00 -0.43 -0.21 0.46
## Create the correlation heat matrix by melting the correlations.
melted_corkhagg <- melt(corkhagg)
head(melted_corkhagg)
## Var1 Var2 value
## 1 aldis aldis 1.00
## 2 andis aldis 0.67
## 3 bidis aldis 0.45
## 4 depdis aldis -0.47
## 5 eddis aldis 0.80
## 6 schdis aldis 0.65
# Create a heat map.
ggplot(melted_corkhagg, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1))+
coord_fixed() +
geom_text(aes(Var2, Var1, label = value), color = "black", size = 4)
# Get lower triangle of the correlation matrix
get_lower_tri_k <-function(corkhagg){
corkhagg[upper.tri(corkhagg)] <- NA
return(corkhagg)
}
# Get upper triangle of the correlation matrix
get_upper_tri_k <- function(corkhagg){
corkhagg[lower.tri(corkhagg)]<- NA
return(corkhagg)
}
upper_tri_k <- get_upper_tri_k(corkhagg)
upper_tri_k
## aldis andis bidis depdis eddis schdis dydis sui rdth
## aldis 1 0.67 0.45 -0.47 0.80 0.65 -0.83 -0.11 0.89
## andis NA 1.00 0.96 -0.97 0.97 0.92 -0.48 -0.20 0.59
## bidis NA NA 1.00 -1.00 0.87 0.91 -0.27 -0.21 0.37
## depdis NA NA NA 1.00 -0.88 -0.90 0.29 0.20 -0.40
## eddis NA NA NA NA 1.00 0.86 -0.63 -0.18 0.74
## schdis NA NA NA NA NA 1.00 -0.43 -0.21 0.46
## dydis NA NA NA NA NA NA 1.00 0.09 -0.76
## sui NA NA NA NA NA NA NA 1.00 -0.07
## rdth NA NA NA NA NA NA NA NA 1.00
# Melt the correlation matrix
melted_corhagg_u_k <- melt(upper_tri_k, na.rm = TRUE)
# Create a correlation Heat map
ggplot(melted_corhagg_u_k, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1))+
coord_fixed()+
geom_text(aes(Var2, Var1, label = value), color = "black", size = 3) +
labs(title = "Correlation Heat Map Of Health Variables In Kenya")+
theme(plot.title = element_text(hjust = 0.5, face="bold")) +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.justification = c(1, 0),
legend.position = c(0.6, 0.7),
legend.direction = "horizontal")+
guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
title.position = "top", title.hjust = 0.5))
-There is a higher average suicide rate in men than women both globally and also in Kenya.
-Kenya has a higher average suicide rate for both genders compared to the global average as at 2019.
-The average probability of death between the age of 30 to 70 from from any of cardiovascular disease, cancer, diabetes or chronic respiratory disease in Kenya has been decreasing since 2008 however an increase has been observed since 2016.
-There has been a significant increase in the prevalence of alcohol and substance use disorder in Kenya, moreover, the prevalence in the country increases as the prevalence of anxiety disorders, eating disorders and schizophrenia increases according to the Kenyan correlation heat map.
-As evident on the correlation heat map the prevalence various mental health issues have an impact on each other for example a significant increase in anxiety disorders increases with the prevalence of bipolar disorder and eating disorders globally with the same being observed in Kenya moreover, in Kenya it also increases with increase in the prevalence of schizophrenia and decreases with the decrease in depressive disorders.
-The global probability of dying between age 30 and 70 from any of cardiovascular disease, cancer, diabetes or chronic respiratory disease has been falling significantly since the 2000s, however, its only been steadily decreasing in Kenya. Men are also at a higher risk of death from these diseases compared to women both globally and locally in Kenya.
-The probability of dying between age 30 and 70 from any of cardiovascular disease, cancer, diabetes or chronic respiratory disease in Kenya has been observed to be significantly inversely proportional to the prevalence of alcohol, substance use anxiety and eating disorders.
-Suicide rates have been observed to not have a significant direct relationship with any mental health disorders both globally and locally however the most significant correlation is the probability of dying between age 30 and 70 from any of cardiovascular disease, cancer, diabetes or chronic respiratory disease in the global analysis.
-Globally a significant inverse relationship between road traffic death rate and eating disorders has been observed however there is a slightly significant relationship between depressive disorders and road traffic death which should be an indicator for further research.
-In Kenya, its been observed that road traffic deaths are inversely proportional to the probability of dying between age 30 and 70 from any of cardiovascular disease, cancer, diabetes or chronic respiratory disease but directly proportional to eating, anxiety, alcohol and substance use disorders.
-Depressive disorders is the most significant variable that has an impact on suicide rates in Kenya therefore further study can look into the impact of depression on attempted and reported suicide cases and other factors that may influence suicide as it has been on the rise in Kenya.
-Road traffic accidents have a significant impact of the mental health of several Kenyans.
-There should be more education regarding suicide prevention for NGOs.
-Corporate firms should look into providing observed health insurance and mental health days off in addition to more sick days for the affected.
-The government can implement policies and programs that provide more efficient facilities for the handling of observed health issues.
-Insurance companies can restructure their products around the knowledge that mental health issues in Kenya have a significant direct relationship to each other and also that the prevalence of alcohol and substance use critically impacts the road traffic death rate in Kenya.
-The government should critically look at the increase in the prevalence of alcohol and substance use disorder in Kenya and especially their impact on mental health and road safety.
-Corporate entities can support any employee impacted by road accident death in order to reduce the risk decreased productivity as a result of emergence mental health issues in their employees.
-There should be some research conducted looking into why the probability of death between the age of 30 to 70 from from any of cardiovascular disease, cancer, diabetes or chronic respiratory diseases in Kenya has been on the rise since 2016 while it has been decreasing globally.
-The impact of depression on attempted and reported suicide cases and other factors that may influence suicide as it has been on the rise in Kenya.
-Why men have a higher suicide rate compared to women and factors that led to higher suicide cases in Kenya compared to global cases.